System and method for paper web time-to-break prediction

ABSTRACT

System and method for generating a time-to-break prediction for a paper web in a paper machine. This invention uses principal components analysis, neuro-fuzzy systems and trending analysis to form a model for predicting the time-to-break of the paper web from paper mill measurements of paper machine process variables. The model is used to isolate the root cause of the predicted web break.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation-in-part of U.S. patentapplication Ser. No. 09/583,155, entitled “System And Method For PaperWeb Time-To-Break Prediction”, filed May 30, 2000, which claims thebenefit of U.S. Provisional Application Serial No. 60/154,127 filed onSep. 15, 1999, entitled “Methods For Predicting Time-To-Break Wet-EndWeb In Paper Mills Using Principal Components Analysis, NeurofuzzySystems And Trending Analysis”.

BACKGROUND OF THE INVENTION

[0002] This invention relates generally to a paper mill, and moreparticularly, to a system and method for predicting web breaksensitivity in a paper machine and isolating machine variables affectingthe predicted web break sensitivity according to data obtained from thepaper mill.

[0003] A paper mill is a highly complex industrial facility thatcomprises a multitude of equipment and processes. In a typical papermill there is an area for receiving raw material used to make the paper.The raw material generally comprises wood in the form of logs that aresoaked in water and tumbled in slatted metal drums to remove the bark.The debarked logs are then fed into a chipper, a device with a rotatingsteel blade that cuts the wood into pieces about ⅛″ thick and ½″ square.The wood chips are then stored in a pile. A conveyor carries the woodchips from the pile to a digester, which removes lignin and othercomponents of the wood from the cellulose fibers, which will be used tomake paper. In particular, the digester receives the chips and mixesthem with cooking chemicals, which are called “white liquor”. As thechips and liquor move down through the digester, the lignin and othercomponents are dissolved, and the cellulose fibers are released as pulp.At the bottom of the digester, the pulp is rinsed, and the spentchemicals known as “black liquor” are separated and recycled.

[0004] Next, the pulp is cleaned for a first time and then screened.Uncooked knots and wood chips, which cannot be passed through thescreen, are returned to the digester to be cooked again. As for thescreened pulp, it is cleaned a second time to obtain a virgin,unbleached pulp. The effluent from the second cleaning is then used forscreening, and goes back to the first cleaning station before it is usedin the digester. The used water ends its journey in a waste waterprimary treatment unit located in another location within the papermill.

[0005] At this point, the pulp is free of lignin, but is too dark to usefor most grades of paper. The next step is therefore to bleach the pulpby treating it with chlorine, chlorine dioxide, ozone, peroxide, or anyof several other treatments. A typical paper mill uses multiple stagesof bleaching, often with different treatments in each step, to produce abright white pulp. Next, refiners, vessels with a series of rotatingserrated metal disks, are used to beat the pulp for various lengths oftime depending on its origin and the type of paper product that will bemade from it. Basically, the refiners serve to improve drainability.Next, a blender and circulator mix the pulp with additives anddistribute the mix of papermaking fibers to a paper machine.

[0006] The paper machine generally comprises a wet-end section, a presssection, and a dry-end section. At the wet-end section, the papermakingfibers are uniformly distributed onto a moving forming wire. The movingwire forms the fibers into a sheet and enables pulp furnish to drain bygravity and dewater by suction. The sheet enters the press section andis conveyed through a series of presses where additional water isremoved and the web is consolidated (i.e., the fibers are forced intomore intimate contact). At the dry-end section, most of the remainingwater in the web is evaporated and fiber bonding develops as the papercontacts a series of steam-heated cylinders. The web is then pressedbetween metal rolls to reduce thickness and smooth the surface and woundonto a reel.

[0007] A problem associated with this-type of paper machine is that thepaper web is prone to break at both the wet-end section of the machineand at the dry-end section. Web breaks at the wet-end section, whichtypically occur at or near the site of its center roll, occur more oftenthan breaks at the dry-end section. Dry-end breaks are relatively betterunderstood, while wet-end breaks are harder to explain in terms ofcauses and are harder to predict and/or control. Web breaks at thewet-end section can occur as much 15 times in a single day. Typically,for a fully-operational paper machine there may be as much as 35 webbreaks at the wet-end section of the paper machine in a month. Theaverage production time lost as a result of these web breaks is about1.6 hours per day. Considering that each paper machine operatescontinuously 24 hours a day, 365 days a year, the downtime associatedwith the web breaks translates to about 6.66% of the paper machine'sannual production, which results in a significant reduction in revenueto a paper manufacturer. Therefore, there is a need to reduce the amountof web breaks occurring in the paper machine, especially at the wet-endsection.

BRIEF SUMMARY OF THE INVENTION

[0008] This invention has developed a system and method for predicting atime-to-break for a paper web in either the wet-end section or thedry-end section of the paper machine using a variety of data obtainedfrom the paper mill. In addition, this invention is able to isolate theroot cause of the predicted web break. Thus, in this invention, there isprovided a paper mill database containing a plurality of measurementsobtained from the paper mill. Each of the plurality of measurementsrelate to a paper machine process variable. A processor processes eachof the plurality of measurements into a modified principal componentsdata set. A break predictor, responsive to the processor, predicts apaper web time-to-break within the paper machine from the plurality ofprocessed measurements.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009]FIG. 1 shows a schematic diagram of a typical paper mill;

[0010]FIG. 2 shows a schematic diagram of a paper machine according tothe prior art that is typically used in the paper mill shown in FIG. 1;

[0011]FIG. 3 shows a schematic of a paper machine used in thisinvention;

[0012]FIG. 4 is a flow chart setting forth the steps used in thisinvention to predict a paper web time-to-break in a paper machine andisolate the root cause of the break;

[0013]FIG. 5 is a flow chart setting forth the steps used to train andtest the predictive model in this invention;

[0014]FIG. 6 is a plot of time-to-break versus time for the actualtime-to-break and the predicted time-to-break, and illustrating upperand lower control limits and the prediction error at various points, asutilized in the present invention;

[0015]FIG. 7 is a flow chart setting forth the steps used in thisinvention to acquire historical web break data and preprocess the data;

[0016]FIG. 8 is a flow chart setting forth the steps used in thisinvention to perform data scrubbing on the acquired historical data;

[0017]FIG. 9 is a flow chart setting forth the steps used in thisinvention to perform data segmentation on the acquired historical data;

[0018]FIG. 10 is a graph for one preferred embodiment of thesegmentation of the break positive data by time-series;

[0019]FIG. 11 is a flow chart setting forth the steps used in thisinvention to perform variable selection on the acquired historical data;

[0020]FIG. 12 is a graph for one preferred embodiment of variableselection by visualization of mean shift;

[0021]FIG. 13 is a flow chart setting forth the steps used in thisinvention to perform principal components analysis (PCA) on the acquiredhistorical data;

[0022]FIG. 14 is a graph for one preferred embodiment of the time-seriesdata of the first three principal components of a representative breaktrajectory;

[0023]FIG. 15 is a flow chart setting forth the steps used in thisinvention to perform value transformation of the time-series data forthe selected principal components;

[0024]FIG. 16 is a graph for one preferred embodiment of the filteredtime-series data of the first three principal components of FIG. 14;

[0025]FIG. 17 is a graph for one preferred embodiment of the smoothed,filtered time-series data of the first three principal components ofFIG. 16;

[0026]FIG. 18 is a flow chart setting forth the steps used in thisinvention to further prepare the data, and train and test the predictivemodel of the present invention;

[0027]FIG. 19 is a schematic representation of a neuro-fuzzy system usedin this invention;

[0028]FIG. 20 is a set of graphs of actual time-to-break, time-to-breakprediction, and moving average time-to-break prediction of fourrepresentative break trajectories;

[0029]FIG. 21 is a set of histograms illustrating various predictionperformance analysis techniques for a high energy group of data;

[0030]FIG. 22 is a set of histograms illustrating various predictionperformance analysis techniques for a mix energy group of data; and

[0031]FIG. 23 is a set of histograms illustrating various predictionperformance analysis techniques for a low energy group of data.

DETAILED DESCRIPTION OF THE INVENTION

[0032]FIG. 1 shows a schematic diagram of a typical paper mill 300. Inthe paper mill 300, a debarker 302 receives logs that have been soakedin water and removes the bark from the logs using slatted metal drums.The debarked logs are then fed into a chipper 304, which cuts the loginto small pieces of wood chips. The wood chips are then stored in apile 306. A conveyor 308 carries the wood chips from the pile to adigester 310, which mixes the chips with the white liquor cookingchemicals. As the chips and liquor move down through the digester,lignin and other components are dissolved, and the cellulose fibers arereleased as pulp. The digester then empties the pulp into a blow pit312. A washer 314 removes the pulp from the blow pit 312 and rinses itand separates and recycles the black liquor.

[0033] Next, the pulp is cleaned for a first time at a screening station(not shown). Uncooked knots and wood chips, which cannot pass throughthe screen, are returned to the digester for additional cooking. As forthe screened pulp, it is cleaned a second time to obtain a virgin,unbleached pulp. A bleach tower 316 then receives the unbleached pulpand treats it with chemicals such as chlorine, chlorine dioxide, ozone,peroxide, etc., to produce a bright white pulp. Next, a beater 318 beatsthe pulp for a predetermined period of time and a refiner 320 thenfurther refines the pulp. Next, a blender and circulator 322 mix thepulp with additives and distribute the mix of papermaking fibers to apaper machine. The paper machine comprises equipment such as a headbox20, a wire 22, presses 34, dryers 36, calenders 38 and a reel 40, all ofwhich are explained below in more detail. One of ordinary skill in theart will recognize that the paper mill 300 may have additional equipmentand processes other than the ones shown in FIG. 1.

[0034]FIG. 2 shows a schematic diagram of a paper machine 10 accordingto the prior art that is typically used in the paper mill 300 shown inFIG. 1. The paper machine 10 comprises a wet-end section 12, a presssection 14, and a dry-end section 16. At the wet-end section 12, aflowspreader 18 distributes papermaking fibers (i.e., a pulp furnish offibers and filler slurry) uniformly across the machine from the back tothe front. The papermaking fibers travel to a headbox 20 which is apressurized flowbox. The pulp furnished is jetted from the headbox 20onto a moving paper surface 22, which is an endless moving wire. The topsection of the wire 22, referred to as the forming section, carries thepulp furnish. Underneath the forming section are many stationarydrainage elements 24 which assist in drainage. As the wire 22 with pulpfurnish travels across a series of hydrofoils or table rolls 26, whitewater drains from the pulp by gravity and pulsation forces generated bythe drainage elements 24. Furnish consistency increases gradually anddewatering becomes more difficult as the wire 22 travels furtherdownstream. Vacuum assisted hydrofoils 28 are used to sustain higherdrainage and then high vacuum flat boxes 30 are used to remove as muchwater as possible. A suction couch roll 32 provides suction forces toimprove water removal.

[0035] The sheet is then transferred from the wet-end section 12 to thepress section 14 where the sheet is conveyed through a series of presses34 where additional water is removed and the web is consolidated. Inparticular, the series of presses 34 force the fibers into intimatecontact so that there is good fiber-to-fiber bonding. In addition, thepresses 34 provide surface smoothness, reduce bulk, and promote higherwet web strength for good runnability in the dry-end section 16. At thedry-end section 16, most of the remaining water in the web is evaporatedand fiber bonding develops as the paper contacts a series ofsteam-heated cylinders 36. The cylinders 36 are referred to as dryerdrums or cans. The dryer cans 36 are mounted in two horizontal rows suchthat the web can be wrapped around one in the top row and then aroundone in the bottom row. The web travels back and forth between the tworows of dryers until it is dry. After the web has been dried, the web istransferred to a calendar section 38 where it is pressed between metalrolls to reduce thickness and smooth the surface. The web is then woundonto a reel 40.

[0036] As mentioned earlier, the conventional paper machine is plaguedwith the paper web breaks at both the wet-end section of the machine andat the dry-end section. FIG. 3 shows a schematic of a system 41 that iscapable of predicting paper web breaks and isolating the root causes forthe breaks from data obtained throughout the paper mill 300 with whichthe paper machine operates. In addition to elements described withrespect to FIG. 2, the paper machine 42 comprises a plurality of sensors44 for obtaining various measurements throughout wet-end section 12, thepress section 14, and the dry-end section 16. There are hundreds ofdifferent types of sensors (e.g., thermocouples, conductivity sensors,flow rate sensors) located throughout the paper machine 42. For example,there may be as many as 374 sensors located throughout the wet-sectionof the paper machine 42. For ease of illustration, the sensors 44 areshown in FIG. 3 as substantially the same symbol even though there aremany different types of sensors used that are typically designated bydifferent configurations. Each sensor 44 obtains a different measurementthat relates to a paper machine variable. Some examples of the type ofmeasurements that may be taken are chemical pulp feed, wire speed, wirepit temperature, wire water pH, and ash content. Note that thesemeasurements are only possible examples of some of the measurementsobtained by the sensors 44 and this invention is not limited thereto.

[0037] A computer 46, coupled to the paper machine 42, receives each ofthe measurements obtained from the sensors 44. The computer 46 storesthe measurements in a paper mill database 55, which places themeasurements in a paper machine database 57. The paper mill database 55also comprises other databases such as a raw materials database 59, apreprocess database 61, an operator shift database 63 and a maintenanceschedule database 69. The raw materials database 59 stores data on theraw materials used to make the paper that include but are not limited toTMP, kraft, raw broke, coated broke, chemicals. The preprocess database61 stores data measured during the preprocessing stages of the rawmaterial such as the screening, cleaning, refining, blending, etc. Someof the preprocess data includes, but are not limited to solution Ph,percentages of raw materials, etc. The data in the operator shiftdatabase 63 stores data that occurs during the different shifts ofoperation of the paper machine such as hours since the time of the lastshift change. The maintenance schedule database 69 stores data on themaintenance performed on the paper machine (e.g., hours of operationssince last blade change). All of the data in these databases areinputted automatically or manually using well known methods.Furthermore, the paper mill database 55 is not limited to these specificdatabases and can include other databases that store data obtained fromany of the equipment and processes located within the paper mill 300.

[0038] The computer 46 preprocesses selected ones of the measurementsstored in the paper mill database 55 and analyzes the preprocessedmeasurements according to a software-based predictive model 47 storedwithin its memory to determine a time-to-break of the paper web, whichmay be displayed by the computer. FIG. 4 is a flow chart setting forththe steps used by the computer in this invention to predict the paperweb time-to-break in the paper machine 42 and to isolate the root causeof the break after the predictive model is sufficiently trained andtested. In FIG. 4, the paper mill measurements are read throughout thepaper mill at 48. Each of the readings relate to a paper machinevariable identified as a principal component affecting web breakage. Aswill be explained below, in one preferred embodiment, only about 3 inputvariables are used from 43 possible readings. Those skilled in the artwill realize that more or less input variables may be used inconjunction with this invention. After obtaining the readings, themeasurements are sent to the computer 46 at 50. The computer thenpreprocesses the measurements into a modified break sensitivity dataset, including modified principal components at 52. In particular, inone preferred embodiment described in detail below, each of themeasurements are transformed into principal components, clustered,normalized, transformed again and shuffled in preparation for use by apredictive model. This preprocessing generally reduces noise in the dataand enhances the features of the data, thereby improving the signal tonoise ratio of the data. After preprocessing, the computer 46 appliesthe predictive model to the preprocessed measurements at 54. Inparticular, the computer 46 uses a predictive modeling tool such as aneuro-fuzzy system to continually predict the time-to-break of the paperweb from the incoming paper machine variables at 56. For example, thesystem may make a prediction over a predetermined time period, such asone prediction every 5 minutes. However, this prediction is not utilizeduntil a trending analysis is performed to adjust the prediction forconsistency with prior predictions at 58, as is explained below. Once aconsistent trend is determined, a final prediction is made from theadjusted prediction at 60. The process repeats itself such that thefinal prediction is updated at the predetermined time period by otherconsistent predictions. Additionally, a performance evaluation of thefinal prediction is performed at 51 to measure the quality of theprediction. Depending on the results of the performance evaluation, at53 the parameters of the neuro-fuzzy system may be adjusted to improvethe accuracy of the prediction through a feedback mechanism, such as bymodifying the software based on its output. Next, the neuro-fuzzy systemis applied at 65 and its rule set is used to isolate the root cause ofthe predicted web break at 67. In isolating the root cause, the modeloutputs explanatory rules that link paper machine variables measuredthroughout the paper mill to the predicted break sensitivity. Theneuro-fuzzy system and the derived rules are described below in moredetail. Thus, the output of the neuro-fuzzy system can be used as aproactive warning of a web break for use in taking corrective action toisolate the root cause of the predicted web break and reduce theprobability of a web break.

[0039] In operation, it was found that a preferred method of alertingthe operator about the advent of a higher break probability or breaksensitivity is to use a stoplight metaphor, which consists ofinterpreting the output of the time-to-break predictor. When thetime-to-break prediction enters the range of about 90 to about 60minutes, an alert such as a yellow light is provided, indicating apossible increase in break sensitivity. When the predicted time-to-breakvalue enters the range of about 60 to about 0 minutes, an alarm such asa red light is provided to warn of the imminent potential for a break.As one skilled in the art will realize, may other time ranges and alertsmay be utilized, such as audible, tactile and other visual indicators.

[0040] In order for this invention to be able to predict thetime-to-break of the paper web and to isolate the root cause of the webbreak, the computer 46 containing the neuro-fuzzy system is trained andtested with historical web break data. For example, in one preferredembodiment, about 67% of the historical data is used for training andabout 33% is used for testing. One skilled in the art will realize thatthese percentages may vary dramatically and still produce acceptableresults. A flow chart describing the training and testing stepsperformed in this invention is set forth in FIG. 5. At 62, thehistorical data set is divided into two parts, a training set and atesting set. The training set is used to train the neuro-fuzzy system topredict the time-to-break and the testing set is used to test theprediction performance of the system when presented with a new data set.If the training is successful, then the model is expected to doreasonably well for a data set that it has never seen before. At 64, thetraining set is used to train the system to predict the time-to-break ofthe paper web. In this invention, the neuro-fuzzy system is trained byusing the process described below in detail. Once the system isdeveloped from the training set, the testing set is utilized to test howwell the trained system predicts the time-to-break at 66. The testing ismeasured by calculating a prediction error, E(t). The prediction erroris defined as: E(t)={Actual-time-to-break(t)−Predicted-time-to-break(t)}. If the trainedsystem does predict the time-to-break with minimal error (e.g., −20minutes>E(60)>40 minutes) at 68, then the system is ready to be usedon-line at 70 to predict the break sensitivity. However, if the trainedsystem is unable to predict the time-to-break with minimal error at 68,then the system is adjusted at 72 and steps 64-68 are repeated until theerror becomes small enough. The adjustments to the system at 72 involvechanging the parameters of the neuro-fuzzy system, such as the number ofinputs and/or the number of membership functions per input.

[0041] In determining the prediction error, E(t), any number of rangesof prediction error at given times, t, may be utilized, depending on theparticular paper machine and the given process variables. Clearly thebest prediction occurs when the error between the real and the predictedtime-to-break is zero. However, the utility of the error is notsymmetric with respect to zero. For instance, if the prediction is tooearly (e.g., predicted time-to-break=60 minutes but actualtime-to-break=90 minutes), then the prediction is providing morelead-time than needed to verify the potential for break, monitor thevarious process variables, and perform a corrective action. On the otherhand, if the prediction is too late (e.g., predicted time-to-break=90minutes but actual time-to-break=60 minutes), then this error reducesthe time required to assess the situation and take a corrective action.Given the same error size, it is preferable to have a positive bias(early prediction), rather than a negative one (late prediction). On theother hand, there should be a limit on how early a prediction can be andstill be useful.

[0042] Therefore, in the preferred embodiment, boundaries areestablished for the maximum acceptable late prediction and the maximumacceptable early prediction. Any prediction outside of these boundarieswill be considered a false prediction. For example, referring to FIG. 6,a predetermined useful prediction window is defined about the actualtime-to-break line 61 for the predicted time-to-break line 63, having alate limit 65 outside which late predictions or false negatives occurresulting in not enough time to take action, and an early limit 67outside which early predictions or false positives occur resulting inpremature warning that may cause too many corrections. These extremes offalse predictions, False Negatives (FN) and False Positives (FP), may bedefined as follows. A False Negative (sometimes referred as a missingprediction) occurs when no predictions are made or when the predictedtime-to-break is more than a predetermined late time period (e.g. 20minutes) compared to the actual time-to-break. A False Positive(commonly referred to as a false alarm) occurs when the predictedtime-to-break is more than predetermined early time period (e.g. 40minutes early) compared to the actual time-to-break. This is consideredto be excessive lead-time, which might lead to unnecessary corrections.In the preferred embodiment, the following limits are defined as themaximum allowed deviations from the origin, where the origin equals theactual time-to-break line:

[0043] FN: E(60)<20 minutes: The system fails to correctly predict abreak if the predicted time-to-break is more than 20 minutes later thanthe actual time-to-break. Note that if the prediction is later than 60minutes, this is equivalent to not making any prediction and having thebreak occurring.

[0044] FP: E(60)>40 minutes: The system fails to correctly predict abreak if the predicted time-to-break is more than 40 minutes earlierthan the actual time-to-break.

[0045] Although these are subjective boundaries, they reflect thegreater usefulness of having earlier rather then later warning/alarms.

[0046] Additionally, after the break predictor model 47 is trained topredict the time-to-break, a software-based fault isolator model 49within the computer is trained and tested with the historical data toderive a set of rules that can explain the root cause any predictedtime-to-break. The derivation of the rules from the neuro-fuzzy systemmay be utilized to pinpoint process variables, related to the readings,that are responsible for the predicted paper web break.

[0047]FIG. 7 describes the historical web break data acquisition stepsand the data preprocessing steps that are used in this invention fortraining. At 74, data from the paper mill including the paper machinedescribed in FIG. 3 is collected over a predetermined time period. Inthe preferred embodiment, data collection may focus on one area of thepaper mill. After the historical data has been collected, then a datareduction process is applied at 76 to render the historical datasuitable for model building purposes. In the preferred embodiment, thedata reduction is subdivided into a data scrubbing process and a datasegmentation process. Following the data reduction, a variable reductiontechnique is utilized at 78 in order to derive a simple, yet robust,predictive model. In the preferred embodiment, the variable reduction issubdivided into a variable selection process and a principal componentsanalysis process, as is discussed below in detail. Once the amount ofdata and the number of variables are reduced, then the data is furthersegmented to develop local models and modified in preparation for use bythe neuro-fuzzy system at 80. The further segmentation and modificationof the data is discussed below in detail. This data is processed by theneuro-fuzzy system to generate a predictive model at 82. This predictivemodel is used to predict a time-to-break that is compared to priorpredictions in a trend analysis process, resulting in a final predictedtime-to-break at 84. Thus, the data acquisition and training results ina predetermined number of local models for continually predicting thetime-to-break of a paper web based on the incoming paper mill variablemeasurements.

[0048] The data gathering and model generation process will now bedescribed in detail with reference to a preferred embodiment. Thoseskilled in the art will realize that the principles taught herein may beapplied to other embodiments. As such, the present invention is notlimited to this preferred embodiment. In one preferred embodiment, papermill data are collected over about a twelve-month period. Note that thistime period is illustrative of a preferred time period for collecting asufficient amount of data and this invention is not limited thereto.Additional variables associated with the paper mill measurements includetwo variables corresponding to date and time information and onevariable indicating a web break. By using a sampling time of one minute,this data collection results in about 66,240 data points or observationsduring a 24-hour period of operation, and a very large data set over thetwelve-month period.

[0049] Referring to FIG. 8, for example, the data scrubbing portion ofthe data reduction involves grouping the data according to various breaktrajectories. A break trajectory is defined as a multivariatetime-series starting at a normal operating condition and ending at awet-end break. For example, a long break trajectory could last up to acouple of days, while a short break trajectory could be less than threehours long.

[0050] A predetermined number of web breaks are identified at 86. In thepreferred embodiment, all of the web breaks are identified, although asmaller sample size may be used. For each web break, a trajectory ofdata is created over a predetermined window at 88. In the preferredembodiment, the trajectory of data is created in a 60-minute windowending with the break. These trajectories are grouped by a predeterminedtype of break, and one of the groups may be selected for furtherprocessing at 90. For example, in the preferred embodiment there arefour major groups of breaks, however, only breaks corresponding tosituations defined as “Unknown Causes” were evaluated. The other majorgroups include breaks with known causes, which are less suitable forpredictive modeling. As a result, data relating to the known causesgroups are taken out of the analysis. Thus, for example, the historicaldata can be reduced to 433 break trajectories, containing 443,273observations and 46 variables.

[0051] Once the data relating to a selected group of trajectories, suchas unknown causes, is defined, the selected break trajectory data isdivided into a predetermined number of groups at 92. For example, thedata may be divided into two groups to distinguish data associated withan imminent break from data associated with a stable operation. Oneskilled in the art will realize, however, that the data may be groupedin numerous other gradations in relation to the break. Utilizing twogroups, the first group contains the set of observations taken within apredetermined pre-break to break time window, such as 60 minutes priorto the break to the moment of the break. This data set is denoted asbreak positive data and, in the preferred embodiment, contains 199,377observations and 46 variables. The remaining data set, containing theset of observations greater than 60 minutes prior to the break, isdenoted as break negative data. In the preferred embodiment, the breaknegative data contains 243,896 observations and 46 variables. The datacollected after the moment of the break is discarded, since it isalready known that the web has broken.

[0052] In the break negative data, a break tendency indicator variableis added to the data and assigned a value of 0 at 94. The breakindicator value of 0 denotes that a break did not occur within the dataset. Further, any incomplete observations and obviously missing valuesare deleted at 96. Additionally, the break negative data is merged withdata representing a paper grade variable at 98. For example, in apreferred embodiment, this yields a final set of break negative datacontaining 233,626 observations and 47 variables.

[0053] In the break positive data, a predetermined break sensitivityindicator variable is added to the data at 100. For example, using the60 minute pre-break to break time window, the break sensitivityindicator is assigned a value of 0.1, 0.5 or 0.9, respectively,corresponding to the first, middle or last 20 minutes of the breaktrajectory. These break sensitivity indicator values represent a low,medium and high break possibility, respectively. As one skilled in theart will realize, the number and value of the break sensitivityindicators may vary based on the application. Further, any incompleteobservations and obviously missing values are deleted at 96. Also, onlythe first data point corresponding to the break is included in the dataset for each break trajectory. This allows each break trajectory dataset to only include relevant data prior to the break. Additionally, thebreak positive data is merged with data representing a paper gradevariable at 98. For example, this yields a final set of break positivedata containing 26,453 observations and 47 variables. Thus, byperforming data scrubbing, two data sets—break positive data and breaknegative data—are created and are used throughout the remainder of theprocess.

[0054] As one skilled in the art will realize, some of the common stepsoutlined above, such as deleting observations and merging paper gradeinformation, may be performed in any order and prior to dividing thedata sets into break positive and break negative data.

[0055] After the data scrubbing 85, a data segmentation 101 isperformed. Referring to FIG. 9 both the break positive and breaknegative data are segmented according to paper grade at 102, sincedifferent grades of paper may exhibit different break characteristics.In the preferred embodiment, for example, a paper grade denoted asRSV656 is selected and the break positive data originally containing 443break trajectories and 26,453 observations (representing numerous papergrades) are segmented into 131 break trajectories and 7,348 observationsrelating to the RSV656 paper grade. Similarly, the break negative datacontaining 233,626 observations are segmented to 59,923 observationsrelating to the RSV656 paper grade.

[0056] The break positive data are preferably further segmented bytime-series analysis at 104. Because each break trajectory is amultivariate time-series containing a large amount of data, it ispreferred to summarize each break trajectory by a single number to aidin the segmentation process. Before this analysis, however, apreliminary variable selection may be performed, including knowledgeengineering, visualization and CART. As one skilled in the art willrealize, the segmentation by time-series analysis and variable selectionmay be performed in any order. The variable selection process isdescribed below in more detail. Although all of the readings could beused, in the preferred embodiment only 31 variables (out of 43 readings)are needed to distinguish the unusual trajectories. The unusualtrajectories, which represent “outlier” trajectories that aresignificantly different than the majority of trajectories, aredistinguished from the data set at 106 as a result of the time-seriessegmentation process. The following is a description of the algorithmfor a preferred time-series segmentation process.

[0057] For each break trajectory

[0058] For each reading

[0059] Build an autoregressive model-AR(1).

[0060] End of “for” loop.

[0061] (At this point, there are 31 AR(1) models; hence 31 correspondingcoefficients).

[0062] Compute the geometric mean of the 31 AR(1) coefficients.

[0063] End of “for” loop.

[0064] The autoregressive model for each reading is of order 1 accordingto the following equation: x(t)=αx(t−1)+ε; where x(t)=the readingindexed by time; α=a coefficient relating the current reading to thereading from the previous time step; x(t−1)=the reading from theprevious time step; and ε=an error term. The idea is to summarize eachmultivariate time-series by a single number, which is the geometric meanof the individual univariate time-series of the break trajectory.Referring to FIG. 10, the geometric mean of AR(1) coefficients 103 froma representative plurality of break trajectories are shown in graphicalform.

[0065] Once the break trajectories are summarized by a single number,they may be segmented into a predetermined number of groups in order toaid in modeling. For example, in a preferred embodiment, the breaktrajectories are divided into two groups. Referring to FIG. 10, onegroup consists of the first 11 break trajectories (the curved portion ofthe line) while the other group comprises the rest of the breaktrajectories. As one skilled in the art will realize, the number ofpredetermined groups and the point of division of the groups is asubjective decision that may vary from one data set to the next. In thepreferred embodiment, for example, the first 11 break trajectories areall very fragmented. They correspond to an “avalanche of breaks,” e.g.,trajectories occurring one after another having lengths much shorterthan 60 minutes (the one-hour time window that immediately follows abreak), and therefore these unusual trajectories are removed from thedata set used for model building at 108. Thus, for example, the datasegmentation results in the break positive data for the RSV656 papergrade having 120 break trajectories and 6,999 observations.

[0066] Once the data reduction 76 (FIG. 7) has been completed, then avariable reduction process 78 (FIG. 7) is initiated to derive thesimplest possible model to explain the past (training mode) and predictthe future (testing mode). Typically, the complexity of a modelincreases in a nonlinear way with the number of inputs used by themodel. High complexity models tend to be excellent in training mode, butrather brittle in testing mode. Usually, these high complexity modelstend to overfit the training data and do not generalize well to newsituations—referred to as “lack of model robustness.” There is amodeling bias in favor of smaller models, thereby trading the potentialability to discover better fitting models in exchange for protectionfrom overfitting. From the implementation point of view, the risk ofmore variables in the model is not limited to the danger of overfitting.It also involves the risk of more sensors malfunctioning and misleadingthe model predictions. In an academic setting, the risk/return tradeoffmay be more tilted toward risk taking for higher potential accuracy inpredicting future outcomes. Therefore, a reduction in the number ofvariables and its associated reduction of inputs is desired to derivesimpler, more robust models.

[0067] Further, in the presence of noise it is desirable to use as fewvariables as possible, while predicting well. This is often referred toas the “principle of parsimonious.” There may be combinations (linear ornonlinear) of variables that are actually irrelevant to the underlyingprocess, that due to noise in data appear to increase the predictionaccuracy. The idea is to use combinations of various techniques toselect the variables with the greater discrimination power in breakprediction.

[0068] The variable reduction activity is subdivided into two steps,variable selection 109 and principal component analysis (PCA) 143, whichare described below. Referring to FIG. 11, a number of techniques may beused for variable selection. They include performing knowledgeengineering at 110, visualization at 112, CART at 114, logisticregression at 116, and other similar techniques. These techniques may beused individually, or preferably in combination, to select variableshaving greater discrimination power in predicting web breakage.

[0069] In the preferred embodiment, for example, by utilizing knowledgeengineering all of the sensors relating to variables corresponding topaper stickiness and paper strength are identified at 118. In thepreferred embodiment, it has been determined that paper stickiness andpaper strength are important variables that affect web breakage. Thisresults in selecting 16 readings and their associated variables at 120.

[0070] Visualization, for example, includes segmenting the breaktrajectories at 122 into four groups or modalities: break negative,break positive (low), break positive (medium) and break positive (high).The modalities of the break positive data correspond to the breaktendency indicator variable of 0.1, 0.5 and 0.9 discussed above. Acomparison of the mean of each modality within each break trajectory isperformed for each variable at 124. As a result, variables havingsignificant mean shifts between modalities are identified and selectedat 126 and 120. In the preferred embodiment, referring to FIG. 12, thevisualization technique 129 plots the mean 131 for each reading bymodality 133, resulting in selecting another eight readings.

[0071] Further, in the preferred embodiment, another five readings areadded utilizing classification and regression trees (CART). CART is usedfor variable selection as follows. Assume there are N input variables(the readings) and one output variable (the web break status, i.e. breakor non-break). The following is an algorithm describing the variableselection process:

[0072] For each input variable:

[0073] Construct a tree model with the single input variable and theoutput variable at 128.

[0074] Let the tree grow until the size of each terminal node is nosmaller than about {fraction (1/100)} of the original data set at 130.

[0075] Prune the tree until the number of terminal nodes is around 10 at132.

[0076] Compute the misclassification rate, which is the sum of thenumber of false positives and false negatives, of the tree model at 134.

[0077] End of “for” loop.

[0078] (At this point, there are N tree models. Each tree has around 10terminal nodes.)

[0079] Rank the N tree models by ascending order of theirmisclassification rates at 136.

[0080] Select the top 20 trees and their input variables at 138.

[0081] The basic idea is to use the misclassification rate as a measureof the discrimination power of each input variable, given the same sizeof tree for each input variable. As one skilled in the art will realize,the size of the tree, the pruning of the tree and selection of the toptrees all include a predetermined number that may vary betweenapplications, and this invention is not limited to the above-mentionedpredetermined numbers. As a result of CART, five more variables notpreviously identified are selected at 120, making a total of 29variables. As mentioned before, these 29 variables are used fortime-series analysis based segmentation at 101 (FIGS. 7 and 9).

[0082] Another method to identify web break discriminating variables islogistic regression. For example, a stepwise logistic regression modelmay be fitted to the break positive data at 140. As a result,significant variables may be identified at 142 by examining variablesincluded in the final logistic regression models. One skilled in the artwill realize that other types of variable classification techniques maybe utilized, such as multivariate adaptive regression splines (“MARS”)and neural networks (“NN”). In the preferred embodiment, utilizinglogistic regression results in a model that identifies two significantvariables—“broke to broke screen” and “headbox ash consistency.”Therefore, these variables are selected at 120 and the total number ofvariables is 31. A list of readings and variable selection methods, inone preferred embodiment, are set forth below in Table 1. TABLE 1Summary of variable selection. Variable CA Logistic REASON TO ID ReadingID Meaning −17 Visualization RT Regression Dropped DROP s1 P26FFC_10 TMPfeed, flow {square root} 83 s2 P26FFC_10 Chemical pulp {square root} 85feed s3 P26FFC_10 Broke feed {square root} 84 s4 P26FIC_127 Filler to{square root} 9 centrifugal cleaner pump s5 P26FFC_17 Clay flow {squareroot} 53 s6 P26NIC_10 Broke to broke {square root} 51 screen s7P26FFC_10 Broke percentage {square root} 84_T s8 P26FFC_10 Bleached TMP{square root} 04_1 percentage s9 P26NI_1518 Total retention {squareroot} _11 s10 P26NI_1518 Ash retention {square root} _12 s11 P26QR_103Chemical pulp {square root} 3 freeness s12 P26QI_1018 Chemical pulp{square root} pH s13 P26QI_1017 Chemical pulp {square root} conductivitys14 P26QI_1016 TMP conductivity {square root} s15 P26QI_1014 Broke{square root} conductivity s16 P26QIC_12 Wire water pH {square root} 78s17 P26TIC_127 Wire pit {square root} 2 temperature s18 P26QI_1516Headbox {square root} conductivity s19 P26FIC_172 Retention aid {squareroot} 1 flow s20 P26TIA_177 Retention {square root} 8 aid/dilution tanks21 P26HIC_17 Foam inhibitor {square root} 16 flow to wair pits s22P26GI_2204 Slice lip position {square root} s23 PK6_SELX Wire section{square root} D_4 speed s24 PK6_ACCX Ash content {square root} D_18 s25PK6_ACCX K-moisture {square root} D_22 s26 P26QI_1013 White water pH{square root} s27 P26TI_1062 White water {square root} tower temperatures28 P26LIC_100 TMP {square root} 5 proportioning chest s29 P26QIC_12 Aircontent {square root} 40 (conrex) s30 P26NI_1518 Headbox ash {squareroot} _2 consistency s31 P26QI_1015 Broke pH {square root} s32 P26FFC_17Caoline flow X 2 52 s33 P26NIC_10 TMP feed, X 3,4 06 consistency s34P26NIC_10 Chemical pulp X 3,4 23 FEED, consistency s35 P26FFC_10Chemical pulp X 3,4 85_T percentage s36 P26NI_1276 Machine pulp X 3,4s37 P26QI_1009 TMP 1 tower pH X 3,4 s38 P26QIC_10 TMP 2 tower pH X 3,410 s39 P26PIS_172 retention aid pipe X 2 3 pressure before screens s40P26FI_0221 Outer wire, wire X 1 _1 water s41 PK6_SELX Draw difference X3,4 D_23 4th press − 1st drier-section s42 T13FFC_60 Alkaline feed X 268 s43 PK6_SELX Draw difference X 3,4 D_22 3rd − 4th press

[0083] For example, of the 43 potential readings, a total of 12 weredropped due to one or more of the reasons, corresponding to “Reason ToDrop” in Table 1: 1—too many missing observations in paper grade RSV656data; 2—too many missing observations; 3—misclassification rate is toohigh; and 4—the means among the low, medium and high groups are tooclose together.

[0084] The variables identified utilizing the variable selectiontechniques are then utilized for principal components analysis (PCA).PCA is concerned with explaining the variance-covariance structurethrough linear combinations of the original variables. PCA's generalobjectives are data reduction and data interpretation. Although pcomponents are required to reproduce the total system variability, oftenmuch of this variability can be accounted for by a smaller number of theprincipal components (k<<p). In such a case, there is almost as muchinformation in the first k components as there is in the original pvariables. The k principal components can then replace the initial pvariables, and the original data set, consisting of n measurements on pvariables, is reduced to one consisting of n measurements on k principalcomponents.

[0085] An analysis of principal components often reveals relationshipsthat were not previously suspected and thereby allows interpretationsthat would not ordinarily result. Geometrically, this processcorresponds to rotating the original p-dimensional space with a lineartransformation, and then selecting only the first k dimensions of thenew space. More specifically, the principal components transformation isa linear transformation which uses input data statistics to define arotation of original data in such a way that the new axes are orthogonalto each other and point in the direction of decreasing order of thevariances. The transformed components are totally uncorrelated.

[0086] Referring to FIG. 13, there are a number of steps in principalcomponents transformation:

[0087] Calculation of a covariance or correlation matrix using theselected variables data at 144.

[0088] Calculation of the eigenvalues and eigenvectors of the matrix at146.

[0089] Calculation of principal components and ranking of the principalcomponents based on eigenvalues at 148, where the eigenvalues are anindication of variability in each eigenvector direction.

[0090] In building a model, therefore, the number of variablesidentified by the variable selection techniques can be reduced to apredetermined number of principal components. In the preferredembodiment, the first three principal components are utilized to buildthe model—a reduction in dimensionality from 31 readings to threeprincipal components. Note that the above reduction comes from bothvariable selection and PCA.

[0091] In the preferred embodiment, two experiments are performed forthe computation of the principal components. First, all 31 variablesfrom the variable selection technique are utilized, including theirassociated break positive data, and the coefficients obtained in the PCAare identified. Then, a smaller subset of a predetermined number ofvariables (16 in this case) are selected at 150 by eliminating variables(15 in this case) whose coefficients were too small to be significant.Then another PCA is performed at 152 utilizing this smaller subset. Thisresult is summarized in Table 2. TABLE 2 Principal components analysisof 16 break positive sensors. Principal Components Eigenvalue ProportionCumulative PRIN1 14.42  90.14%  90.14% PRIN2 0.49 3.07% 93.20% PRIN30.32 1.98% 95.19% PRIN4 0.25 1.57% 96.76% PRIN5 0.18 1.10% 97.85% PRIN60.08 0.51% 98.37% PRIN7 0.06 0.38% 98.75% PRIN8 0.05 0.34% 99.09% PRIN90.04 0.24% 99.33% PRIN10 0.03 0.22% 99.55% PRIN11 0.03 0.16% 99.71%PRIN12 0.02 0.11% 99.82% PRIN13 0.01 0.08% 99.90% PRIN14 0.01 0.05%99.95% PRIN15 0.01 0.04% 100.00%  PRIN16 0.00 0.00% 100.00% 

[0092] From the first row of Table 2, in the preferred embodiment, thefirst principal component explains 90% of the total sample variance.Further, the first six principal components explain over 98% of thetotal sample variance. Thus, a predetermined number of the top-rankedprincipal components, and their associated data, are selected at 154.Consequently, in the preferred embodiment, it is determined that samplevariation may be summarized by the first three principal components andthat a reduction in the data from 16 variables to three principalcomponents is reasonable. As one skilled in the art will realize, anypredetermined number of principal components may be selected, dependingon the number of variables desired and the amount of variance desired tobe explained by the variables.

[0093] As a result of the principal component analysis, the time-seriesof the first three principal components for each break trajectory may begenerated. FIG. 14 represents a plot of the time-series of the firstthree principal components 151, 153 and 155 for a representative breaktrajectory.

[0094] Once the principal components are identified, then valuetransformation techniques 80 are applied to the principal componentsdata in order to build the predictive model. The main purpose of valuetransformation is to remove noise, reduce data size by compression, andsmooth the resulting time-series to identify and highlight their generalpatterns (i.e., velocity, acceleration, etc.). This goal is achieved byusing typical signal-processing algorithms, such as a median filter anda rectangular filter.

[0095] Referring to FIG. 15, the time-series data for each selectedprincipal component is identified at 156. Each set of time-series datais suppressed to form a noise-suppressed time-series data set at 158.Then each noise-suppressed time-series data is compressed to form acompressed, suppressed time-series data set at 160. For example, a valuetransformation using a median filter serves two purposes—it filters outnoises and compresses data. This results in summarizing a block of datainto a single, representative point. FIG. 16 shows the filteredtime-series plot of the three principal components 165, 167 and 169 ofthe representative break trajectory of FIG. 14. Note that the windowsize of the median filter is three. Further, additional filters may beapplied to smooth the data to form a smoothed, compressed, suppressedtime-series data set at 162. For example, a rectangular moving filtermay be applied across the sequence of the three principal components insteps of one. This results in smoothing the data and canceling outnoises. FIG. 17 shows the smoothed, filtered time-series plot of thethree principal components 171, 173 and 175 of the representative breaktrajectory of FIGS. 14 and 16. Note that the window size of therectangular filter is five.

[0096] Referring to FIG. 18, the predictive model generation, trainingand testing further includes grouping or clustering the principalcomponents break trajectory data by energy content at 164 in order todetermine separate predictive models. For example, one method ofclustering the principal components break trajectory data is by sortingbased on the mean of the first principal component. As one skilled inthe art will realize, other methods of sorting the break trajectoriesinto different modalities may be utilized, such as by taking the medianof the first principal component or by utilizing a combination of meanand standard deviation. Alternatively, rather than utilizing a number ofpredictive models, a single model may be generated from all of the data.In the preferred embodiment, the break trajectories are clustered intothree groups: a low energy group, a medium energy group and a highenergy group. A list of statistics from the clustering step of thepreferred embodiment are set forth below in Table 3. TABLE 3Representative summary statistics of the three energy groups. Whole Lowenergy Mix energy High energy dataset group group group # of 102 62 2911 Trajectories # of Data 50,664 33,415 13,911 3,338 Points Min. of1^(st) 2.193 2.193 2.327 2.581 PCA Mean of 1^(st) 2.589 2.513 2.7032.882 PCA Max. of 1^(st) 3.508 2.867 3.508 3.234 PCA

[0097] Next, the break trajectory data of the principal components isnormalized at 166. In the preferred embodiment, the data is normalizedwithin the range of 0.1 to 0.9 to avoid saturation of the nodes on theneuro-fuzzy system input layer. The following equation may be used tonormalize the data:${{normalized}\quad {value}} = \frac{{{nominal}\quad {value}} - {{minimum}\quad {value}}}{{{maximum}\quad {value}} - {{minimum}\quad {value}}}$

[0098] where the minimum and maximum values are obtained across onespecific field. In other words, the normalization occurs across columnsof variables, as opposed to rows of data points.

[0099] The normalized data is then transformed to reduce variability at168. In the preferred embodiment, a natural logarithm transformation isapplied to the normalized data. One skilled in the art will realize,however, that other variability reducing transformations may beutilized, such as different basis of log or logistic functions.

[0100] Next, the data is then shuffled at 170. Through shuffling, thedata is randomly permuted across all patterns. In other words, thepermutation is effected across rows of data points within each modalityor energy group. This enhances the ability of the neuro-fuzzy system tolearn the underlying function of mapping the input states, obtained fromthe readings, to the desired output (time-to-break prediction) in astatic way, as opposed to a dynamic way that involves time changes ofthese values. This results in reduced complexity and computationalrequirements for the system.

[0101] The data is then input into a neuro-fuzzy system in order togenerate the predictive models at 172. As one skilled in the art willrealize, the steps 166, 168 and 170 may be performed in any order.Further, some of these steps may be skipped, such as the normalizationor log transformation, depending on the desired accuracy of the finalprediction. The preferred neuro-fuzzy system is a network-basedimplementation of fuzzy inference, called Adaptive Network-based FuzzyInference System (“ANFIS”). Referring to FIG. 19, the preferred ANFISmodel 177 implements the fuzzy system as a five-layer neural network sothat the structure of the net can be interpreted in terms of high-levelfuzzy rules. This network is then trained automatically from the data.In the system, ANFIS takes as input the paper machine variables,specifically the values of the principal components, then gives asoutput the predicted time-to-break for the paper web at 174 (FIG. 18).

[0102] As the data points in the training set are presented, the ANFISmodel attempts to minimize the mean squared error between the networkoutput, or predicted time-to-break, and the targeted answer, or actualtime-to-break. The training method proceeds as follows:

[0103] For each pair of training patterns (input and targeted output) do

[0104] Present inputs to ANFIS and compute the output.

[0105] Compute the error between ANFIS's output and the targeted output.

[0106] Keep the IF-part parameters fixed, solve for the optimal valuesof the THEN-part parameters using a recursive Kalman filter method.

[0107] Compute the effect of the IF-part parameters on the error andfeed it back.

[0108] Adjust the IF-part parameters based on the feedback error using agradient descent technique.

[0109] End of “for” loop

[0110] Repeat until the error is sufficiently small.

[0111] For prediction purposes, in the preferred embodiment, only thedata in the last three hours prior to a break was utilized. Recall thatthe median filter has a window size of 3. Therefore, each breaktrajectory is modeled with 60 data points at most.

[0112] For example, with the high energy group there were 552 (less than11 break trajectories×60 data points=660 due to incomplete breaktrajectories) data points for ANFIS modeling. Of the available data, 400data points were used for training and 152 for testing. In the preferredembodiment, the ANFIS has three inputs—the first three principalcomponents. Each input has two generalized bell-shaped membershipfunctions (MF). Thus, there are 50 modifiable parameters for thespecific ANFIS structure. The training of ANFIS stopped after 100 epochsand the corresponding training and testing root mean squared error(RMSE) were 0.1063 and 0.1209, respectively. The RMSE is defined asfollows:${RMSE} = \sqrt{\frac{\sum\limits_{i = 1}^{n}\quad \left( {Y_{i} - {\hat{Y}}_{i}} \right)^{2}}{n}}$

[0113] where Y and Ŷ

[0114] are the actual and predicted responses, respectively, and n isthe total number of predictions. Table 4 summarizes ANFIS training forthe three energy groups. TABLE 4 Summary of ANFIS training for the threeenergy groups. Low energy Mix energy High energy group group group # of62 29 11 trajectories # of total data 3,566 1,609 552 # of training2,566 1,209 400 data # of testing 1,000 400 152 data # of inputs 3 3 3 #of MFs 4 3 2 Type of MF Generalized Generalized Generalized bell-shapedbell-shaped bell-shaped # of 292 135 50 modifiable parameters # ofepochs 25 25 100 Training 0.0988 0.0965 0.1063 RMSE Testing 0.10250.1156 0.1209 RMSE

[0115] Referring again to FIG. 18, the predicted time-to-break isprocessed using a trend analysis at 176. The trend analysis takesadvantage of the correlation between consecutive time-to-breaks points.For example, the time interval between two consecutive time-to-breakspoints is 3 minutes. If one data point represents 9 minutes to break,the next data point in time should represent 6 minutes to break and thenext data points represents 3 minutes to break, etc. Therefore, theslope of the line that connects all these time-to-break points should beone (assuming that the x-axis and the y-axis are time and time-to-break,respectively). The same theory can be applied to the predicted value oftime-to-break. That is, the slope of an imaginary line that connectspredicted time-to-breaks should be close to one, given a perfectpredictor. This line connecting the predicted time-to-break points isdenoted as the prediction line.

[0116] In the real world, it is unlikely that the prediction would everbe perfect due to noises, faulty sensors, etc. Hence, it is unlikelythat the prediction line would have a slope of one. Nevertheless, in thepresent invention the slope of the prediction line approaches one byrecursively throwing out the “outlier” data points—those predictive datapoints that are far away from the prediction line—and recursivelyre-estimating the slope of the prediction line.

[0117] Even more importantly, the predictions will be inconsistent whenthe “open-loop” assumption is violated. An abrupt change in the slopeindicates a strongly inconsistent prediction. These inconsistencies canbe caused, among other things, by a control action applied to correct aperceived problem. The present invention is interested in predicting thetime-to-break in an open-loop process, where no control action is taken.However, the data are collected in a closed-loop process, where thepaper machine is controlled by the operators. Therefore, the inventionneeds to be able to detect when the application of control actions—whichare not recorded in the data—have changed the trend of the breaktrajectory. In such case, the predictive model of the present inventionsuspends the current prediction and reset the prediction history. Thisstep eliminates many false positives.

[0118] For example, a moving window of a predetermined size, such asten, may be utilized. Then, the slope and the intercept of theprediction line is estimated by least mean squares. After that, apredetermined number of outliers to the line, such as 2 to 4 orpreferably 3, are dropped. Then, the slope and intercept of theprediction line are re-estimated with the remaining data points, whichin this example are seven data points. The window is advanced in timeand the above slope and intercept estimation process is repeated. As aresult, two time-series of slopes and intercepts are obtained.

[0119] Then, two consecutive slopes are compared to see how far awaythey are from one, which would be a perfect prediction. If they arewithin a pre-specified tolerance band, e.g. 0.1, then the average of thetwo intercepts is utilized as the predicted time-to-break. Otherwise, acalculation is performed to obtain a modified average of the twoconsecutive slopes and intercepts to readjust these estimates. In thisway, the prediction is continuously adjusted according to the slope andintercept estimation.

[0120]FIG. 20 shows the prediction results of four typical breaktrajectories 181, 183, 185 and 187 from the low energy group. In thefigure, the x-axis and y-axis represent prediction points andtime-to-break in minutes, respectively. The dashed line 180 representsthe target or actual time-to-break, while the circle points 182 and thestar points 184 represent the time-to-break point prediction and themoving average of the point prediction, respectively. The finalprediction is an (equally) weighted average of the point prediction(typically overestimating the target) with the moving average (typicallyunderestimating the target).

[0121] A performance analysis comparing predicted versus actualtime-to-break is performed at 178 (FIG. 18). The Root Mean Squared Error(RMSE), defined above, is a typical average measure of the modelingerror. However, the RMSE does not have an intuitive interpretation thatmay be used to judge the relative merits of the model. Therefore,additional performance metrics may be used in the evaluation of thetime-to-break predictor. In the preferred embodiment, and referring toFIGS. 21-23, the following metrics are utilized:

[0122] Distribution of false predictions 191: E(60)

[0123] False positives are predictions that were made too early (i.e.,more than 40 minutes early). Therefore, time-to-break predictions ofmore than 100 minutes (at time=60) fall into this category. Falsenegatives are missing predictions or predictions that were made too late(i.e., more than 20 minutes late). Therefore, time-to-break predictionsof less than 40 minutes (at time=60) fall into this category

[0124] Distribution of prediction accuracy 193: RMSE

[0125] Prediction accuracy is defined as the root mean squared error(RMSE) for a break trajectory.

[0126] Distribution of error in the final prediction 195: E(0)

[0127] The final prediction by the model is generally associated withhigh confidence and better accuracy. The final prediction is associatedwith the prediction error at break time, i.e., E(0).

[0128] Distribution of the earliest non false positive prediction 197

[0129] The first prediction by the predictor is generally associatedwith high sensitivity.

[0130] Distribution of the maximum absolute deviance in prediction 199

[0131] This is the equivalent to the worst-case scenario. It shows thehistogram of the maximum error by the predictor.

[0132] FIGS. 21-23 show the resultant performance distributions of thehigh 201, mix 203 and low 205 energy groups, respectively. Of the threegroups, the high energy group is the least reliable one, since the modelwas trained with only 11 trajectories. Referring to FIG. 21, based onthe first histogram—showing the distribution of E(60)—it is noted thatout of eleven trajectories, seven are correctly classified and fourbreak trajectories are undetected (false negative). The relative highpercentage of false negatives in this group is due, in part, to theextremely low number of trajectories available to train the model forthis group. The reliability and coverage of the prediction will increasewith the size of the training set, as illustrated by the next two groupsReferring to FIG. 22, the mix energy group exhibits an improvement inthe quality of the prediction, when compared with the high energy group,since the predictive model was trained on 29 trajectories (instead of11). It is noted from the first histogram—showing the distribution ofE(60)—that out of 29 trajectories, the model has 22 correctlyclassified. Three more trajectories are misclassified (2 false positiveand 1 false negative) and only four break trajectories are undetected(false negative).

[0133] Referring to FIG. 23, the low energy group exhibits the bestprediction quality, since the predictive model was trained on 62 breaktrajectories. It is noted from the first histogram—showing thedistribution of E(60)—that out of 62 trajectories, the model correctlyclassifies 51 trajectories. Five more trajectories are misclassified (3false positive and 2 false negative) and only six break trajectories areundetected (false negative).

[0134] It should be noted that some of the false positives can beattributed to the closed-loop nature of the data: the human operatorsare closing the loop and trying to prevent possible breaks, while themodel is making the prediction in open-loop, assuming no humanintervention.

[0135] Two of the more important figures are the first and thirdhistograms in each of FIGS. 21-23, showing the distribution of E(60) andE(0), i.e., the distribution of the prediction error at the time of thealert (red zone) and at the time of the break. An analysis of thepredictions is illustrated in Tables 5 and 6 below: TABLE 5 Analysis ofthe Histograms E(60) False False Coverage: Relative Global NegativePositive Number Accuracy: Accuracy: E(60) Number Number Number ofCorrect Correct Number of of of Predictions Predictions Predictions ofMissed Late of Early per per per Trajectories Predictions PredictionsPredictions Trajectory prediction Trajectory Low  11  4 0 0  7/11 = 7/7=  7/11 = Energy 63.6% 100.0% 63.6% Mix  29  4 1 2 25/29 = 22/25 = 22/29= Energy 86.2%  88.0% 75.9% High  62  6 2 3 56/62 = 51/56 = 51/62 =Energy 90.3%  91.1% 82.3% Total 102 14 3 5 88/102 = 80/88 = 80/102 =86.3% 90.9% 78.4%

[0136] TABLE 6 Analysis of the Histograms E(0) - Final Error False FalseCoverage: Relative Global Negative Positive Number Accuracy: Accuracy:E(0) Number Number Number of Correct Correct Number of of of PredictionsPredictions Predictions of Missed Late of Early per per per TrajectoriesPredictions Predictions Predictions Trajectory prediction Trajectory Low 11  4 1 0  7/11 = 6/7 =  6/11 = Energy 63.6% 85.7% 54.5% Mix  29  4 0 225/29 = 23/25 = 23/29 = Energy 86.2% 92.0% 79.3% High  62  6 0 4 56/62 =52/56 = 52/62 = Energy 90.3%  92.9% 83.9% Total 102 14 1 6 88/102 =81/88 = 81/102 = 86.3% 92.0% 79.4%

[0137] The two histograms show a similar behavior of the error betweentime=60 and time=0. The variance of at the time of the break (t=0) isslightly smaller than at the time of the alarm (t=60 minutes). Overall,the models show a very robust performance. Furthermore the modelsslightly overestimate the time-to-break: the mean of the distribution ofthe final error E(0), is around 20 minutes, (i.e. the models tend topredict the break 20 minutes earlier than it actually occurs). Finally,in analyzing the histograms of the earliest final prediction for thethree models, it is noted that reliable predictions are made, onaverage, 140-150 minutes before the break occurs.

[0138] Thus, the model generated by the process performed quite well.Out of a total of 102 break trajectories, 88 predictions were made, ofwhich 80 were correct (according to the lower and upper limitsestablished for the prediction error at time =60, e.g. E(60)). Thiscorresponds to a prediction coverage of 86.3% of all trajectories. Therelative accuracy, defined as the ratio or correct predictions over thetotal amount of prediction made, was 90.9%. The global accuracy, definedas the ratio or correct predictions over the total amount oftrajectories, was 78.4%. In summary, we have developed a process thatgenerates a very accurate model that minimizes false alarms (FP) whilestill providing an adequate coverage of the different type of breakscaused by unknown causes.

[0139] The predictive models are preferably maintained over time toguarantee that they are tracking the dynamic behavior of the underlyingpapermaking process. Therefore, it is suggested to repeat the steps ofthe model generation process every time that the statistics for coverageand/or accuracy deviate considerably from the ones experienced inbuilding the running model. It is also suggested to reapply the modelgeneration process every time that twenty new break trajectories withunknown causes are acquired.

[0140] As mentioned earlier, the rules from the model can be used toisolate the root cause of any predicted web break. In particular, inpredicting the paper web time-to-break in the paper machine, the ruleset may be utilized to determine that the root cause of this predictedbreak may be due to certain sensor measurements not being within acertain range. Therefore, the paper machine may be proactively adjustedto prevent a web break.

[0141] The following is a list of software tools that may be utilizedfor the processes of the present invention:

[0142] 1 Data scrubbing—the Excel™ software program or the MATLAB™software program (to read files); SAS™ software program (to scrub datafiles)

[0143] 2 Data segmentation—SAS™ software program

[0144] 3 Variable selection—SAS™ software program; S+ CART™ softwareprogram; Excel™ software program or MATLAB™ software program (tovisualize variables over time)

[0145] 4 Principal Components Analysis (PCA)—SAS™ software program

[0146] 5 Filtering—MATLAB™ software program

[0147] 6 Smoothing—MATLAB™ software program

[0148] 7 Clustering—SAS™ software program

[0149] 8 Normalization—GNU C™ software program

[0150] 9 Transformation—MATLAB™ software program

[0151] 10 Shuffling—GNU C™ software program

[0152] 11 ANFIS—GNU C™ software program

[0153] 12 Trending—MATLAB™ software program

[0154] 13 Performance analysis—MATLAB™ software program

[0155] As one skilled in the art will realize, other similar softwaremay be utilized to produce similar results, such as the Splus™ program,the Mathmatica™ software program and the MiniTab™ software program.

[0156] Although this invention has been described with reference topredicting the time-to-break and isolating the root cause of the breakin the wet-end section of the paper machine, this invention is notlimited thereto. In particular, this invention can be used to predictthe time-to-break of a paper web and isolate the root cause in othersections of the paper machine, such as the dry-end section and the presssection.

[0157] It is therefore apparent that there has been provided inaccordance with the present invention, a system and method forpredicting a time-to-break of a paper web in a paper machine that fullysatisfy the aims, advantages and objectives hereinbefore set forth. Theinvention has been described with reference to several embodiments;however, it will be appreciated that variations and modifications can beeffected by a person of ordinary skill in the art without departing fromthe scope of the invention.

What is claimed is:
 1. A system for predicting a paper web break in apaper machine located about a paper mill, comprising: a paper milldatabase containing a plurality of measurements obtained from the papermill, each of the plurality of measurements relating to a predeterminedpaper machine variable; a processor for processing each of the pluralityof measurements into modified break sensitivity data; and a breakpredictor responsive to the processor for predicting a time-to-break ofthe paper web from the plurality of processed measurements.
 2. Thesystem according to claim 1, wherein the break predictor comprises apredictive model.
 3. The system according to claim 2, wherein thepredictive model comprises a neuro-fuzzy system.
 4. The system accordingto claim 2, wherein the predictive model comprises an adaptivenetwork-based fuzzy inference system.
 5. The system according to claim4, wherein the adaptive network-based fuzzy inference system is trainedwith historical web break data.
 6. The system according to claim 1,wherein the modified break sensitivity data comprise time-basedtransformations of the plurality of measurements.
 7. The systemaccording to claim 1, wherein the modified break sensitivity datacomprise principal components of the plurality of measurements.
 8. Thesystem according to claim 1, wherein the break sensitivity data comprisenoise-reduced and feature-enhanced transformations of the plurality ofmeasurements.
 9. The system according to claim 1, further comprising afault isolator responsive to the break predictor for determining thepaper machine variables affecting the predicted time-to-break of thepaper web.
 10. The system according to claim 9, wherein the faultisolator comprises an adaptive network-based fuzzy inference modelhaving a set of rules linking paper machine variables to the predictedtime-to-break of the paper web.
 11. The system according to claim 9,wherein the fault isolator isolates the paper machine variables that areroot causes for the predicted time-to-break of the paper web.
 12. Thesystem according to claim 1, further comprising an indicator mechanismfor updating the status of the machine by indicating the predicted paperweb time-to-break.
 13. The system according to claim 1, furthercomprising a feedback mechanism for adjusting the performance of thebreak predictor.
 14. The system according to claim 1, wherein theprocessor further processes the predicted time-to-break and priorpredicted times-to-break into a final predicted time-to-break.
 15. Thesystem according to claim 1, wherein the plurality of measurementscontained in the paper mill database are generated from variousprocesses occurring within the paper mill.
 16. The system according toclaim 1, wherein the paper mill database comprises a raw materialsdatabase, a preprocess database, a paper machine database, an operationshift database and a maintenance schedule database.
 17. A system forpredicting a paper web break in a paper machine located about a papermill, comprising: a paper mill database containing a plurality ofmeasurements from the paper mill, each of the plurality of measurementsrelating to a predetermined paper machine variable; a processor forprocessing each of the plurality of measurements into modified breaksensitivity data comprising time-based transformations of the pluralityof data; and a break predictor responsive to the processor forpredicting a time-to-break of the paper web from the plurality ofprocessed measurements, wherein the break predictor comprises apredictive model.
 18. The system according to claim 17, wherein thepredictive model comprises a neuro-fuzzy system.
 19. The systemaccording to claim 18, wherein the predictive model comprises anadaptive network-based fuzzy inference system.
 20. The system accordingto claim 19, wherein the modified break sensitivity data compriseprincipal components of the plurality of measurements.
 21. The systemaccording to claim 20, further comprising a fault isolator that isolatesthe paper machine variables that are root causes for the predictedtime-to-break of the paper web.
 22. The system according to claim 20,further comprising an indicator mechanism for updating the status of thepaper machine by indicating the predicted paper web time-to-break. 23.The system according to claim 20, further comprising a feedbackmechanism for adjusting the performance of the break predictor.
 24. Thesystem according to claim 20, wherein the processor further processesthe predicted time-to-break and prior predicted times-to-break into afinal predicted time-to-break.
 25. The system according to claim 17,wherein the plurality of measurements contained in the paper milldatabase are generated from various processes occurring within the papermill.
 26. The system according to claim 17, wherein the paper milldatabase comprises a raw materials database, a preprocess database, apaper machine database, an operation shift database and a maintenanceschedule database.
 27. A method for predicting a paper web break in apaper machine located about a paper mill, comprising: obtaining aplurality of measurements from the paper mill, each of the plurality ofmeasurements relating to a predetermined paper machine variable;processing each of the plurality of measurements into modified breaksensitivity data; and predicting a time-to-break for the paper webwithin the paper machine from the plurality of processed measurements.28. The method according to claim 27, wherein predicting thetime-to-break for the paper web comprises applying a predictive model.29. The method according to claim 27, wherein predicting thetime-to-break for the paper web comprises applying a neuro-fuzzy system.30. The method according to claim 27, wherein predicting thetime-to-break for the paper web comprises applying an adaptivenetwork-based fuzzy inference system.
 31. The method according to claim27, further comprising training the adaptive network-based fuzzyinference system with historical web break data.
 32. The methodaccording to claim 31, further comprising testing the trained adaptivenetwork-based fuzzy inference system with the historical break data totest how well the system predicts the time-to-break.
 33. The methodaccording to claim 31, wherein the training comprises preprocessing thehistorical web break data.
 34. The method according to claim 33, whereinthe preprocessing comprises: reducing the quantity of the historical webbreak data; reducing the number of variables contained in the historicalweb break data; transforming the values of the historical web breakdata; enhancing features that affect web break sensitivity from thehistorical web break data; and generating the adaptive network-basedfuzzy inference system to predict the time-to-break.
 35. The methodaccording to claim 27, wherein the processing of the plurality ofmeasurements into modified break sensitivity data further comprisestime-based transformations of the plurality of measurements.
 36. Themethod according to claim 27, wherein the processing of the plurality ofmeasurements into modified break sensitivity data further comprisestransforming the plurality of measurements into principal components forweb breakage.
 37. The method according to claim 27, further comprisingprocessing the predicted time-to-break and prior predictedtimes-to-break into a final predicted time-to-break.
 38. The methodaccording to claim 27, further comprising adjusting the predicting ofthe time-to-break based on an analysis of the performance of thepredicted time-to-break.
 39. The method according to claim 27, furthercomprising updating the status of the paper machine by indicating thepredicted time-to-break.
 40. The method according to claim 27, furthercomprising isolating the paper machine variables affecting the predictedtime-to-break.
 41. The method according to claim 27, wherein theobtaining of the plurality of measurements comprises receivingmeasurements generated from various processes occurring within the papermill.
 42. A method for predicting a paper web break in a paper machinelocated about a paper mill, comprising: obtaining a plurality ofmeasurements from the paper mill, each of the plurality of measurementsrelating to a predetermined paper machine variable; performing atime-based transformation of each of the plurality of measurements toproduce modified break sensitivity data; and predicting a time-to-breakfor the paper web within the paper machine from the plurality ofprocessed measurements by applying a predictive model.
 43. The methodaccording to claim 42, wherein predicting the time-to-break for thepaper web comprises applying a neuro-fuzzy system.
 44. The methodaccording to claim 42, wherein predicting the time-to-break for thepaper web comprises applying an adaptive network-based fuzzy inferencesystem.
 45. The method according to claim 44, further comprisingtraining the adaptive network-based fuzzy inference system withhistorical web break data.
 46. The method according to claim 45, furthercomprising testing the trained adaptive network-based fuzzy inferencesystem with the historical break data to test how well the systempredicts the time-to-break.
 47. The method according to claim 44,wherein performing the time-based transformation of the plurality ofmeasurements into modified break sensitivity data further comprisestransforming the plurality of measurements into principal components forweb breakage.
 48. The method according to claim 47, further comprisingprocessing the predicted time-to-break and prior predictedtimes-to-break into a final predicted time-to-break.
 49. The methodaccording to claim 48, further comprising adjusting the predicting ofthe time-to-break based on an analysis of the performance of thepredicted time-to-break.
 50. The method according to claim 49, furthercomprising updating the status of the paper machine by indicating thepredicted time-to-break.
 51. The method according to claim 50, furthercomprising isolating the paper machine variables affecting the predictedtime-to-break.
 52. The method according to claim 42, wherein theobtaining of the plurality of measurements comprises receivingmeasurements generated from various processes occurring within the papermill.